276 research outputs found

    Minimum error correction-based haplotype assembly: considerations for long read data

    Full text link
    The single nucleotide polymorphism (SNP) is the most widely studied type of genetic variation. A haplotype is defined as the sequence of alleles at SNP sites on each haploid chromosome. Haplotype information is essential in unravelling the genome-phenotype association. Haplotype assembly is a well-known approach for reconstructing haplotypes, exploiting reads generated by DNA sequencing devices. The Minimum Error Correction (MEC) metric is often used for reconstruction of haplotypes from reads. However, problems with the MEC metric have been reported. Here, we investigate the MEC approach to demonstrate that it may result in incorrectly reconstructed haplotypes for devices that produce error-prone long reads. Specifically, we evaluate this approach for devices developed by Illumina, Pacific BioSciences and Oxford Nanopore Technologies. We show that imprecise haplotypes may be reconstructed with a lower MEC than that of the exact haplotype. The performance of MEC is explored for different coverage levels and error rates of data. Our simulation results reveal that in order to avoid incorrect MEC-based haplotypes, a coverage of 25 is needed for reads generated by Pacific BioSciences RS systems.Comment: 17 pages, 6 figure

    Influence of mutations of Val226 on the catalytic rate of haloalkane dehalogenase

    Get PDF
    Haloalkane dehalogenase converts haloalkanes to their corresponding alcohols. The 3D structure, reaction mechanism and kinetic mechanism have been studied. The steady state kcat with 1,2-dichloroethane and 1,2-dibromoethane is limited mainly by the rate of release of the halide ion from the buried active-site cavity. During catalysis, the halogen that is cleaved off (Clα) from 1,2-dichloroethane interacts with Trp125 and the Clβ interacts with Phe172. Both these residues have van der Waals contacts with Val226. To establish the effect of these interactions on catalysis, and in an attempt to change enzyme activity without directly mutating residues involved in catalysis, we mutated Val226 to Gly, Ala and Leu. The Val226Ala and Val226Leu mutants had a 2.5-fold higher catalytic rate for 1,2-dibromoethane than the wild-type enzyme. A pre-steady state kinetic analysis of the Val226Ala mutant enzyme showed that the increase in kcat could be attributed to an increase in the rate of a conformational change that precedes halide release, causing a faster overall rate of halide dissociation. The kcat for 1,2-dichloroethane conversion was not elevated, although the rate of chloride release was also faster than in the wild-type enzyme. This was caused by a 3-fold decrease in the rate of formation of the alkyl-enzyme intermediate for 1,2-dichloroethane. Val226 seems to contribute to leaving group (Clα or Brα) stabilization via Trp125, and can influence halide release and substrate binding via an interaction with Phe172. These studies indicate that wild-type haloalkane dehalogenase is optimized for 1,2-dichloroethane, although 1,2-dibromoethane is a better substrate.

    Kinetic Characterization and X-ray Structure of a Mutant of Haloalkane Dehalogenase with Higher Catalytic Activity and Modified Substrate Range

    Get PDF
    Conversion of halogenated aliphatics by haloalkane dehalogenase proceeds via the formation of a covalent alkyl-enzyme intermediate which is subsequently hydrolyzed by water. In the wild type enzyme, the slowest step for both 1,2-dichloroethane and 1,2-dibromoethane conversion is a unimolecular enzyme isomerization preceding rapid halide dissociation. Phenylalanine 172 is located in a helix-loop-helix structure that covers the active site cavity of the enzyme, interacts with the Clβ of 1,2-dichloroethane during catalysis, and could be involved in stabilization of this helix-loop-helix region of the cap domain of the enzyme. To obtain more information about the role of this residue in dehalogenase function, we performed a mutational analysis of position 172 and studied the kinetics and X-ray structure of the Phe172Trp enzyme. The Phe172Trp mutant had a 10-fold higher kcat/Km for 1-chlorohexane and a 2-fold higher kcat for 1,2-dibromoethane than the wild-type enzyme. The X-ray structure of the Phe172Trp enzyme showed a local conformational change in the helix-loop-helix region that covers the active site. This could explain the elevated activity for 1-chlorohexane of the Phe172Trp enzyme, since it allows this large substrate to bind more easily in the active site cavity. Pre-steady-state kinetic analysis showed that the increase in kcat found for 1,2-dibromoethane conversion could be attributed to an increase in the rate of an enzyme isomerization step that preceeds halide release. The observed conformational difference between the helix-loop-helix structures of the wild-type enzyme and the faster mutant suggests that the isomerization required for halide release could be a conformational change that takes place in this region of the cap domain of the dehalogenase. It is proposed that Phe172 is involved in stabilization of the helix-loop-helix structure that covers the active site of the enzyme and creates a rigid hydrophobic cavity for small apolar halogenated alkanes.

    In silico assessment of a novel single-molecule protein fingerprinting method employing fragmentation and nanopore detection

    Get PDF
    Summary: The identification of proteins at the single-molecule level would open exciting new venues in biological research and disease diagnostics. Previously, we proposed a nanopore-based method for protein identification called chop-n-drop fingerprinting, in which the fragmentation pattern induced and measured by a proteasome-nanopore construct is used to identify single proteins. In the simulation study presented here, we show that 97.1% of human proteome constituents are uniquely identified under close to ideal measuring circumstances, using a simple alignment-based classification method. We show that our method is robust against experimental error, as 69.4% can still be identified if the resolution is twice as low as currently attainable, and 10% of proteasome restriction sites and protein fragments are randomly ignored. Based on these results and our experimental proof of concept, we argue that chop-n-drop fingerprinting has the potential to make cost-effective single-molecule protein identification feasible in the near future

    Caretta – A multiple protein structure alignment and feature extraction suite

    Get PDF
    The vast number of protein structures currently available opens exciting opportunities for machine learning on proteins, aimed at predicting and understanding functional properties. In particular, in combination with homology modelling, it is now possible to not only use sequence features as input for machine learning, but also structure features. However, in order to do so, robust multiple structure alignments are imperative. Here we present Caretta, a multiple structure alignment suite meant for homologous but sequentially divergent protein families which consistently returns accurate alignments with a higher coverage than current state-of-the-art tools. Caretta is available as a GUI and command-line application and additionally outputs an aligned structure feature matrix for a given set of input structures, which can readily be used in downstream steps for supervised or unsupervised machine learning. We show Caretta's performance on two benchmark datasets, and present an example application of Caretta in predicting the conformational state of cyclin-dependent kinases.</p

    Cnidaria: fast, reference-free clustering of raw and assembled genome and transcriptome NGS data

    Full text link
    Background: Identification of biological specimens is a major requirement for a range of applications. Reference-free methods analyse unprocessed sequencing data without relying on prior knowledge, but generally do not scale to arbitrarily large genomes and arbitrarily large phylogenetic distances. Results: We present Cnidaria, a practical tool for clustering genomic and transcriptomic data with no limitation on genome size or phylogenetic distances. We successfully simultaneously clustered 169 genomic and transcriptomic datasets from 4 kingdoms, achieving 100% identification accuracy at supra-species level and 78% accuracy for species level. Discussion: CNIDARIA allows for fast, resource-efficient comparison and identification of both raw and assembled genome and transcriptome data. This can help answer both fundamental (e.g. in phylogeny, ecological diversity analysis) and practical questions (e.g. sequencing quality control, primer design).Comment: 47 pages, 13 figure

    Three-dimensional Structure of L-2-Haloacid Dehalogenase from Xanthobacter autotrophicus GJ10 Complexed with the Substrate-analogue Formate

    Get PDF
    The L-2-haloacid dehalogenase from the 1,2-dichloroethane degrading bacterium Xanthobacter autotrophicus GJ10 catalyzes the hydrolytic dehalogenation of small L-2-haloalkanoic acids to yield the corresponding D-2-hydroxyalkanoic acids. Its crystal structure was solved by the method of multiple isomorphous replacement with incorporation of anomalous scattering information and solvent flattening, and was refined at 1.95-Å resolution to an R factor of 21.3%. The three-dimensional structure is similar to that of the homologous L-2-haloacid dehalogenase from Pseudomonas sp. YL (1), but the X. autotrophicus enzyme has an extra dimerization domain, an active site cavity that is completely shielded from the solvent, and a different orientation of several catalytically important amino acid residues. Moreover, under the conditions used, a formate ion is bound in the active site. The position of this substrate-analogue provides valuable information on the reaction mechanism and explains the limited substrate specificity of the Xanthobacter L-2-haloacid dehalogenase.

    Topology of molecular interaction networks

    Get PDF
    Abstract Molecular interactions are often represented as network models which have become the common language of many areas of biology. Graphs serve as convenient mathematical representations of network models and have themselves become objects of study. Their topology has been intensively researched over the last decade after evidence was found that they share underlying design principles with many other types of networks. Initial studies suggested that molecular interaction network topology is related to biological function and evolution. However, further whole-network analyses did not lead to a unified view on what this relation may look like, with conclusions highly dependent on the type of molecular interactions considered and the metrics used to study them. It is unclear whether global network topology drives function, as suggested by some researchers, or whether it is simply a byproduct of evolution or even an artefact of representing complex molecular interaction networks as graphs. Nevertheless, network biology has progressed significantly over the last years. We review the literature, focusing on two major developments. First, realizing that molecular interaction networks can be naturally decomposed into subsystems (such as modules and pathways), topology is increasingly studied locally rather than globally. Second, there is a move from a descriptive approach to a predictive one: rather than correlating biological network 1 topology to generic properties such as robustness, it is used to predict specific functions or phenotypes. Taken together, this change in focus from globally descriptive to locally predictive points to new avenues of research. In particular, multi-scale approaches are developments promising to drive the study of molecular interaction networks further
    corecore